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Abstract 

We decompose a matrix Y into a sum of bilinear terms in a stepwise 
manner, by considering Y as a mapping from the finite dimensional 
space Z” to the space We provide transition formulas, and represent 
them in a duality diagram, thus generalizing the well known duality 
diagram in the french school of data analysis. As an application, we 
introduce a family of Euclidean multidimensional scaling models. 
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1 Introduction 

Matrix factorization, named also decomposition, in data analysis is at the 
core of factor analysis; and one of its principal aims, as clearly stated by 
Hubert et al. (2000), is to visualize geometrically the statistical association 
existing among the rows or the columns of the matrix. So the way that we 
factorize a matrix is of fundamental interest and concern in statistics. What 
is surprising is that the oldest method, the centroid factorization, see Burt 
(1917) and Thurstone (1931), has been rediscovered recently many times, see 
for instance proposal 1 in McCoy and Tropp (2011). Singular value decom¬ 
position (SVD) is the most used matrix decomposition method in statistics; 
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the aim of this paper is to present in a coherent way the theory of SVD- 
like matrix factorizations based on subordinate or induced norms; and at 
the same time, review the existing literature. This presentation generalizes 
the SVD by embedding it in a larger family: It belongs to the class of op¬ 
timal biconjugate decompositions; biconjugate decompositions are based on 
Wedderburn rank-one reduction theorem as described by Chu et ah (1995). 
Other alternative generalization of SVD, GSVD, is presented by Hubert et 
al. (2000), and which forms the basis of the french school of data analysis 
as reviewed recently by Holmes (2008) and De La Cruz and Holmes (2011). 
We also incorporate the GSVD in our representation. 

This paper is organized as follows: Section 2 presents the preliminar¬ 
ies concerning induced or subordinate matrix norms; section 3 presents the 
matrix factorizations based on induced norms, and we conclude in section 4. 

2 Preliminaries on real Banach spaces 

We start with some preliminaries and at the same time introduce notation. 
We note: 

Ip := (R 11.1 Ip) is a hnite dimensional Banach space; that is, R ” is 
n-dimensional complete vector space with the p-norm, ||.||p, for p > 1. For 
an X G R its p-norm is dehned as ||x||i = 1^*1 P = 1) 11^1 Ip = 

(X]r=i for p > 1, and ||x||oo = maxjL;^ |a;j| for p = oo. 

The norm ||x||p has the following four properties 

(Nl) ||x||p > 0 

(N2) I |x| Ip = 0 iff X = 0 

(N3) ||q;x||p = Q;||x||p for a G R 

(N4) ijx-hyllp < ||x||p-h ||y||p 

(N4) implies: 11 |x| |p — | |y| |p| < | |x — y| |p, from which we deduce that the 
p-norm is a continuous mapping of R into R. 

The proof of (N4) is based on Holder and Minkowski inequalities. 

We dehne the unit sphere to be 

^; = {xGR^: ||x||^ = l}, 

and (p,pi) designate the conjugate pair, that is, - -f — = 1 for p > 1 and 

Pi > L 
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Holder inequality; 

< X*, X > < ||x*||pj|x||p for X* e and ^ ^ Ip 
or 

< X*, X > < ||x||p for X* E Sp^ and xElp, 
or 

< X*, X > < 1 for X* e S'” and x G S'”. 

Note that < x*, x > = — (x*)'x = x'x*, where x' is the trans¬ 

pose of the row vector x; further, < x*, x > represents a scalar product only 
when the conjugate pair {p,pi) = (2,2). The next result is an application of 
Holder inequality. 

Lemma 1; Let x E Ip, then there exists a norming functional 9 ?(x) G S'”^ 
such that < 99(x),x > = ||x||p = sup^* < x*,x > subject to x* G S'”^. 
Explicitly we have: 


99 (x) = {vj = sgn{xj) ) for p = 1 

= {vj = sgn{xj) I |^~^) for p > 1 

N^Np 

= Oq sgn{xa) for p = 00 , 

where {e^ : /3 = 1 ,..., n} designates the canonical basis and Xa = argmax^^^^ \xg\- 

Remark: In more general settings. Lemma 1 is proven as a corollary to 
the famous Hahn-Banach theorem, see for instance Kreyszig (1978, p.223). 

Example 1; Consider the vector x' = (1 2 — 1 — 2). 

a) If X G /|, then ||x ||2 = 10^/^ and <p(x) = ^ ^2 and < 

(p(x),x > = Explicitly ^(x)' = (12 - 1 - 2 )/ 10 ^/ 2 _ 

b) If X G If, then | |x| |i = 6 and <p(x) = sgn{x) E and < <p(x), x > = 6. 

Explicitly (p(x)' = (1 1 - 1 - 1). 

c) If X G /^, then ||x||oo = 2 and <p(x) = —64 G S '4 and < (p(x), x > = 2. 

Explicitly <p(x)' = (0 0 0 — 1). Another value is: <p(x)' = (0 1 0 0). 

d) If X G If, then ||x ||3 = 18^/^ and <p(x) = {vj = -j^ij 3 sgn{xj)) E Sf^ 

and < 99 (x),x > = 18^/^. Explicitly <p(x)' = (14 — 1 — 4)/18A3. 
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Let be the set of bounded linear maps (operators) from to 

which we identify with the set of m x n real matrices in the usual way. If 
A G then A' G Let A G be an operator, then 

its induced or subordinate norm is dehned to be 

||A||r^p = max{||Au||p : u G 5'”}. (1) 

Then 

IIA'IIpj^^j = max{||A'v||^^ : v G S'™}, (2) 

and the next theorem is a well known central result. 

Theorem 1; 


All — IIA'II 

-^Mr—M'^ Mpi— 

= max{v'Au ; u G S'” and v G S'™} 

= v}Aui for Ui G S'” and Vi G S'™ 

= v}ai = b}ui = Ai 


where 

and 


Aui = ai and vi = v^(ai) 
AVi= bi and ui = v^(bi). 


and the last two equations are known as transition formulas. 


(3) 

(4) 


Proof. For u G S'” and v G S'™, we consider the bilinear form 
A(u, v) = v'Au 

< ||Au||p by Holder inequality for Au G/” 

< max 11 Au| Ip = 11 A| by (1) (5) 

= ||Aui||p = v}Aui where vi = 99 (Aui) = y)(ai) ( 6 ) 

= max v'Aui by Lemma 1 

ves™ 

= max max v'Au. 

ves^ ue5" 
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Now using (5 and 6) and replacing A by A' we have 
||A||^^p = v'Aui 

= u'^AVi where Ui = ^^(AVi) = 9 ?(bi) 

- IIA'II 

M-^ I Ipi—) 

which is the required result. 

The transition formulas (3 and 4) can be represented by the following 
duality diagram 


on 

Oj. 




^ri 


A 


A' 


r tp 

i ^ 

cm 
^Pi 


Remark 1; 

a) The geometrical-statistical interpretation of Theorem 1 is that Ai is 
the largest dispersion value by which the operator A stretches an element 
of u G S'”; Ui is called the hrst principal axis of the rows of A, and ai 
represents the projected values of the rows of A on ui, and we name it the 
hrst projected row factor or the hrst principal component. And by duality, 
we also have Ai is the largest dispersion value by which the operator A' 
stretches an element of v G S'™; vi is called the hrst principal axis of the 
columns of A, and bi is the hrst column projected factor which represents 
the projected values of the columns of A on Vi. Essentially, we are computing 
the quintiplet (ai, bi, ui, vi, Ai). 

b) The vectors ai, bi, ui and vi belong to four diherent spaces: ai G /™, 
bi G , ui G S'” C /” and Vi G S'™ C /™. 

c) The transition formulas provide us an iterative algorithm to compute 
a maximum of {||Au||p ; u G S'”}; this maximum value can be a relative 
maximum. The norm ||A||,,^p corresponds to the absolute maximum. The 
algorithm is named the power method for Ip norm by Boyd (1974); Wold’s 
(1966) NIPALS (nonlinear iterative partial alternating least squares) algo¬ 
rithm, named also criss-cross regression by Gabriel and Zamir (1979), is a 
particular case. The algorithm can be summarized in the following way, 
where b is a starting value: 
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Step 1; u =99(b), a = Au and A(a) = | |a| 1^; 

Step 2: V =99(a), b = A'v and A(b) = ||b||^^ ; 

Step 3: If A(b) —A(a) >0, go to Step 1; otherwise, stop. 

The proof of the convergence of the algorithm is based on application of 
Holder inequality twice: Let and A^^^ for k > 1 represent the kth 

iteration values, then: 

y(fc) = 

< (v*'^^'A) 99(AV*'^^) by Holder inequality 

< <f{Au^^+^'>y{Au^^+^'>) byHdlder inequality 

The rows or the columns of A can be used as starting values for a or b. 

2.1 Particular norms 

Let A G H(/”, /™), then A' G H(/™, /”J. In general the conjugate pairs (r, ri) 
and {pi,p) are not equal, which implies that the geometry of the rows is 
different from the geometry of the columns. If (r, ri) = {pi,p) = {'r,p), then 
the geometric structure dehned on the rows of A is identical to the geometric 
structure dehned on the columns of A; this class was named transposition 
invariant by Choulakian (2005). 

For two particular values of the conjugate pairs (r, ri), explicit formulas 
are available; however, the spectral norm is the most well known, which is 
transposition invariant, 
a) (r,ri) = (oo, 1), then 

||A||oo^p = IIA'Ilp^^i by Thorem 1 

= max||Au|| subject to u G {—1,+1}"'. 

The proof is based on Holder inequality: For any v G S'™ consider 

||AV||i = u'A'v for u =s 5 'n(AV) G { —1,+!}"■ by Lemma 1, 

< ||Au||p for ug{ —1,+!}"' by Holder inequality, 

< max||Au||p for ug{—1,+!}"' 

= IIAuj^llp where ui = argmax ||Au||p subject to u G {—1,+1}"'. 
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By Lemma 1, if v =99(Aui), then ||A' = max{||A'v||i ; v G S'™} = 

maxu ||Au||p subject to u G {—1, +1}"', which is the required result, 
b) (r,ri) = (1,cxd), then 


lAI 




= 11 A'l |pj_j.oo by Theorem 1 


™ I I A 

= max 11 A: 

a=l 


*a\ Ip? 


where A*^ is the ath column of A. 

The proof is similar to the proof in a). For any v G S'™ consider 



< 

< 


u'AV 



for u = Oq sgn{xa) and Xa - 
for u = sgn{xa) and Xa 

for ui = argmax||AeQ,|| , 

11 " 


^ I A / I 

arff max A v 
^ g=i' 

= arg max A v 

13=1 


by Lemma 1, 
by Holder inequality, 


max ||A*q,|L 

a=l ^ 


By Theorem 1, if v =(^(Aui), then max{| | AV| |oo : v G S'™} = max™^]^ 11 A*a| |p 
maXu 11 Au| 1^ subject to u = for a = 1, n, which is the required result, 
c) For (r, ri) = (2,2), then 

|| A || 2^.2 = 11 A ' 112^.2 

= -^max(AA ) 

where Amax is the greatest eigenvalue of AA' or A'A, and it is named spectral 
norm. 

Drakakis and Pearlmutter (2009) and Lewis (2010) discuss the following 
nine cases of ||A||r_j.p for r,p = 1, 2, 3, which can be easily deduced from the 
above results. 


3 Matrix factorizations 

Let X G H(/”,/™). Let (ai,bi) be the hrst projected factors associated with 
Ai. We repeat the above procedure on the residual dataset 

= X-aib;/Ai (7) 

= (I^-P.JX 

= X(I„-P,J, 
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where Pa^ = aiv^/Ai is the projection operator on ai G because = 
Pai- Similarly, Pb^ = biuj/Ai is the projection operator on bi G /” . We 
note that the rankCX.^^^) = rank{X.)—l, because 

= 0 and X(i)'vi = 0; ( 8 ) 

which implies that 

u'^b 2 = 0 and v'^a 2 = 0 . (9) 

Equations (7,8,9) are known as Wedderburn’s rank one reduction formula, 
see Chu, Funderlic and Golub (1995). By repeating the above procedure we 
get the data reconstitution formula for the matrix X as a function of the 
projected row and column factor coordinates (aQ,,bQ,) associated with the 
dispersion values Aq,, for a = 1,..., k, and k = rank(X.), 

k 

X = a,b:,/A„ (10) 

or elementwise 

k 

Xij ^ ^ Qa(^)^a(j)/AQ. 
a.=l 

Equation (10) represents the decomposition of X based on —)■ induced 

norm. 

3.1 The case of X symmetric 

When the matrix X is symmetric, we can have a symmetric decomposition 
or a nonsymmetric factorization. 

a) If the norms are transposition invariant, that is, the conjugate pairs 
(r,ri) = (pi,p) = (r,p), then 


k 

X ^ ^ ^a^a./^cii 

a=\ 

for the geometric structure dehned on the rows of X is identical to the geo¬ 
metric structure dehned on the columns of X. 

b) If the norms are not transposition invariant, that is, the conjugate 
pairs (r,ri) 7 ^ (pi,p), then 



k 

X ^ ^ ^a^a/^a.1 

a=l 

for the geometric structure defined on the rows of X is different from the 
geometric structure dehned on the columns of X. 


3.2 A review 


Here, we review published discussed cases in the statistical literature. 

a) The centroid decomposition based on ||A||oo-i .2 = ||A'|| 2 -j.i; its transi¬ 
tion formulas are 

Aui = ai and vi = ai/A/a^^ 


and 


AVi= bi and ui = sgn(hi)-, 


it is the oldest to our knowledge. First used by Burt (1917), then by Thur- 
stone (1931) to factorize covariance matrices, and used extensively in the 
psychometric literature before the advent of the computers, see for instance 
Thurstone (1947), Horst (1965) and Harman (1967). Burt-Thurstone formu¬ 
lation was based on the following criterion; 


maxu'A'Au subject to u G {—1,-|-1}"'; (11) 


its relationship with the matrix norm formulation was shown by Choulakian 
(2003). In different, but related contexts, it is discussed by Galpin and 
Hawkins (1987), Chu and Funderlic (2002), Choulakian (2005), Kwak (2008), 
McCoy and Tropp (2011). Further, Choulakian (2012) considered it as a 
particular MAXBET procedure which takes into account the block structure 
of the variables. 

b) 11 A| 1 1.^1 = ||A'||oo^>oo is used by Calpin and Hawkins (1987); its tran¬ 
sition formulas are 

Aui = ai and vi = sgn{sii) 

and 


AVi= bi and ui = such that a = argmax|&ij| = arg max 11 A*jj | . 

j j 

c) The taxicab decomposition is based on ||A||oo-?.i = ||A'||oo^.i; its tran¬ 
sition formulas are 


Aui = ai and vi = sgn{sii) 
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and 


A'vi= bi and ui = sgn{hi). 

It is the most robust among all the transposition invariant induced norms 
considered in this paper, and is used extensively by Choulakian and cowork¬ 
ers in developing taxicab correspondence analysis: Choulakian (2004, 2006, 
2008a, 2008b, 2013, 2014), Choulakian et ah (2006, 2013a, 2013b, 2014). We 
also note that the taxicab decomposition of a covariance matrix is equiva¬ 
lent to the centroid decomposition of the centred dataset. The taxicab norm 
was hrst considered by Grothendieck, see the interesting story of the Groth- 
hendieck theorem and its many versions by Pisier (2012). Here, we cite this 
remarkable result 

Grothendieck Inequality: Let A = (a^) be a real matrix of size m x n; 
then for i = 1 ,.., m and j = 1 ,..., n 

m n 

max Sitj subject to {si,tj) E { — 1,1}^ 

1=1 J =1 

m n 

max EE ttij < Si,tj > subject to {si,tj) E si X si 

Si ,tj 

i=l j = l 

m n 

Kd max EE ttij < Si,tj > subject to (sj,tj) E si X S^2) 

S j ,t 7 

2 = 1 j = l 

where Kd is a universal constant that depends on d for d = 2, 3,..., but does 
not depend on m and n. By dehning Ai = 1, we see that Kd > Kd-i for 
d = 2,3, .... The open problem is that there exists a universal constant Kq 
such that 


lAI 


00^1 — 


< 


Kg = inf Kd such that the inequality in (12) is true. 

d 

It is conjectured that 

1.67695 < Kg< - - -^ = 1.78221. 

-21og(l + y2) 

An elementary proof of the inequality is given by Blei (1987) or Jameson 
(1987). A randomization algorithm to compute 11 A| via the Grothendieck 
inequality is studied by Alon and Naor (2006), and it is used by McCoy and 
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Tropp (2011) to compute (11). Rohn (2000) shows that the computation of 
||A 11 00^-1 is NP-hard. 

d) The singular value decomposition, SVD, is the standard decomposi¬ 
tion, the most used and studied; it is based on ||A|| 2^.2 = ||A'|| 2 -j. 2 , see Horn 
and Johnson (1990) and Golub and Van Loan (1996). Its transition formulas 
are 

Aui = ai and vi = ai/A/a)^ 

and 

A'vi= bi and Ui = bi/ A/b(bi. 


Example 2: Let us compute a few decompositions to the following ma¬ 
trix 

1 - 2 \ 

-2 4 (13) 

0 2 / 

a) Taxicab decomposition; ||X||oo^i is attained at one of the axes: u' = 
(1 1) or (1 — 1). For u'= (1 1), (Xu)'= (—1 2 2), and ||Xu||i = 5. For 
u'= (1 — 1), (Xu)'= (3 —6 — 2), and I|Xu||i = 11. So, u'^ = (1 — 1), 
a( = (3 - 6 - 2), v( = sgn{a!^) = (1 - 1 - 1), b( = (X'vi)' = (3 - 8 ), 
Ai = 11 = ||ai||i = a(vi = ||bi||i = b(ui. Note that ui = sgnihi). Now the 
residual matrix, X*^^)= X — aib(/Ai, is 

/ 2 2 \ 

XW = -4 -4 /ll, 

\ 6 6 / 

which is of rank 1. Repeating the above calculations on X^), we hnd u '2 = 
(1 1), a '2 = (1 - 2 3) 4/11, = sgn{a!^) = (1 - 1 1), b^ = (X'v 2 )' = 

(1 1) 12/11, A 2 = 24/11 = ||a 2 ||i = a' 2 V 2 = ||b 2 ||i = b 2 U 2 . Note that 
U 2 = sgn{h 2 )- Now the residnal matrix, X*^^)= a 2 b' 2 /A 2 = 0. So we 

have the following decomposition 

X = (3 -6 -2)'(3 - 8 )/ll + (l -2 3)'(1 1)2/11. (14) 

b) Centroid decomposition: ||X||oo ^.2 is attained at one of the axes: u' = 

(1 1) or (1 — 1). For u'= (1 1), (Xu)'= (—1 2 2), and ||Xu ||2 = \/9. 

For u'= (1 — 1), (Xu)' = (3 —6 — 2), and ||Xu ||2 = 7. So, u'^ = (1 — 1), 
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a; = (3 -6 - 2 ), v; = a;/Ai = (3 -6 -2)/7, b; = (XVi)' = (15 -34)/7, 

Ai = 7 = ||ai ||2 = a'^vi = ||bi||i = b'^ui. Note that ui = sgn{hi). Now the 
residual matrix, X — aib(^/Ai, is 

/ 4 4 A 

X(^) = -8 -8 /49, 

\ 30 30 / 

which is of rank 1. Repeating the above calculations on X*^^\ we hnd U 2 = 

(1 1), a'=(2 -4 15) 4/49, v' = a'/||a 2||2 = (2 -4 15) /(7y5), 

b 2 = (X'V 2 )' = (1 1) 2\/5/7, A 2 = 4\/5/7 = ||a 2||2 = a 2 V 2 = ||b 2 ||i = b 2 U 2 . 

Note that U 2 = sgn{h 2 )- Now the residual matrix, X*'^^= a 2 b 2 /A 2 = 0. 

So we have the following decomposition 

X=(3 -6 -2)'(15 - 34)/49 + (2 -4 15)'(1 1)2/49. (15) 

c) Extreme decomposition: ||X||i_j.oo is attained on one of the canonical 
basis vectors: e'^ = (1 0) or 62 = (0 1). For u = ei, (Xu)' = (1 —2 0), and 
||Xu||oo = 2 . For u = 62 , (Xu)' = (—2 4 2 ), and ||Xu||oo = 4. So, ui = 62 , 
ai = X *2 the second column of X, v( = (0 1 0), b'^ = (X'vi)' = (—2 4), 

Ai = 4 = ||ai||oo = a'^vi = ||bi||oo = b'^ui. Note that ui = e 2 sgnihi^). Now 
the residual matrix, X*^^^= X — aib'^/Ai, is 

/O 0 \ 

xW =00 /ll, 

Vi 0 ; 

which is of rank 1. Repeating the above calculations on X^), we hnd U 2 = ei, 
a '2 = (0 0 1), V2 = a2, b '2 = (X'v2)' = (1 0), A 2 = 1 = ||a2||oo = 

a' 2 V 2 = ||b 2 ||oo = b' 2 U 2 . Note that U 2 = eisgn{h 2 i). Now the residual matrix, 

X( 2 )= x(^)-a 2 b' 2 /A 2 = 0 . So we have the following decomposition 

X= (- 2 4 2)'(-2 4)/4 + (0 0 1)'(1 0). (16) 

d) Singular value decomposition: It is based on the eigenvalues and eigen¬ 

vectors of the symmetric matrix X'X: Ai = 5.3191, v( = (—0.4197 0.8393 0.3455), ai 
Aivi, ui = (-0.3945 0.9189) and bi = Aiup A 2 = 0.8408, v' = (-0.1545 0.3090 - 
0.9384), ai = AiVi, U 2 = (—0.9189 — 0.3945) and b 2 = A 2 U 2 . So we have 

the following decomposition 

X = (-0.4197 0.8393 0.3455)'(-0.3945 0.9189)5.3191 + 

(-0.1545 0.3090 - 0.9384)'(-0.9189 - 0.3945)0.8408. (17) 
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Remark 2: 

a) We note that the factors (ac, bo.) are determined up to proportionality, 
and the four decompositions in equations (14) through (17) of the data set X 
given in (13) are essentially different. This is a much discussed and important 
topic, named factor indeterminacy problem; see for instance Mulaik (1987). 
We can recast or reformulate the factor indeterminacy problem within a 
geometric setting: If Rank(K) > 2, then there are inhnite number of different 
factorizations depending on the values of r > 1 and p > 1 for X G R(/”, /™). 

b) The decomposition of X is essentially unique (up to proportionality) 
if and only if rank(X.) = 1. 

c) Conditions for essential uniqueness of decompositions for three-way 
arrays or tensors is an active area of research; and Kruskal’s sufficiency the¬ 
orem is the most famous general result, see Rhodes (2010). For an overview 
of the literature on tensor decomposition, see the interesting review by Ten 
Berge (2011). 

3.3 A family of Euclidean multidimensional scaling mod¬ 
els 


Let A = [Sij) be a symmetric n x n matrix with nonnegative elements 
and zeros on the diagonal, representing the dissimilarities of n objects. The 
aim of multidimensional scaling (MDS) techniques is to find a conhguration 
of the n points, which best match the original dissimilarities as much as 
possible. We shall consider the framework of the classical MDS, named 
also principle coordinate analysis, where we suppose that the dissimilarities 
represent Euclidean distances, which implies that there exists a set of n 
centered points in a Euclidean space, denoted by {f* : i = 1, ...,n}, such that 


We thus have the following well known relationship 

Q = -i(HA"H) 

= F'F, 

where F = [f^, ...,f„] and H = is the centering matrix, I„ the 

identity matrix and the vector of ones. So the matrix Q is positive semi- 
dehnite, and it is equal to F'F, where F is unknown. Now suppose that 
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F G Then 


||F||r^p = max{||Fu||p : u G S'”} 

can be expressed as a linear finction of Q if p = 2 and r > 1; that is 

||F||^^2 = u'F'Fu subject to u G S'” 

= u'Qu subject to u G S'”. (18) 

Factorizing Q into F'F by (18), we obtain a family of Euclidean multidimen¬ 
sional scaling (MDS) models as a function of r > 1. Three particular cases 
are worthy of mention: 

a) For r = 2, we obtain the classical MDS, see for instance Torgerson 
(1952) and Gower (1966). Each fa is an eigenvector of Q; see also subsection 
2.1 part c). 

b) For r = oo, we get the centroid MDS, where we maximize the Burt- 
Thurstone criterion (11), see also subsection 2.1 part a). 

c) For r = 1, we get the dominant MDS; its computation is extremely 
simple and fast, see subsection 2.1 part b). 

Example 3: We consider the Facial Expressions data found in Borg and 
Groenen (2005, p. 76) of dimension 13x13, where n = 13 is the number of 
person’s facial expressions. The aim of the study is the correct identihcation 
of intended emotional message from a person’s facial expression. Further¬ 
more, Table 4.3, p. 75 in Borg and Groenen, provide Schlosberg empirical 
scale values that classify the facial expressions into three classes: pleasant- 
unpleasant (PU), attention-rejection (AR) and tension-sleep (TS). Borg and 
Groenen (2005, subsection 4.3) found that the hrst two dimensions of ordi¬ 
nal MDS reproduced quite accurately the three classes: the hrst dimension 
representing PU and the second dimension representing AR and TS, because 
the correlations between the hrst two calculated dimensions and the Schlos¬ 
berg empirical scale values are quite high for ordinal MDS. Table 1 compares 
the correlation values obtained by four MDS approaches; the ordinal MDS 
correlation values are reproduced from Borg and Groenen (2005, p.77. Table 
4.6): The centroid MDS produced results as good as the ordinal MDS. 
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Table 1: Facial Expressions Data: Correlation values. 


MDS 

corr(DIMl,PU) 

corr(DIM2,AR) 

corr(DIM2,TS) 

ordinal 

0.94 

0.86 

0.87 

classical 

0.91 

0.80 

0.83 

centroid 

0.93 

0.86 

0.89 

dominant 

0.91 

0.78 

0.70 


4 The French school of data analysis 

Benzecri (1973), who was a pure mathematician in geometry in the 1950s, 
is considered the father of the french school of data analysis; he developed a 
geometric generalized Euclidean framework for multidimensional data analy¬ 
sis by introducing two metric matrices (square and positive dehnite) M and 
N. In this setting, the duality diagram of Theorem 1 becomes 

A 

^n(N)^ /- 

in ^ ^-(M) 

A' 

where 

^”(N) = {x G R ; x'Nx = 1}, 

and 

,S™(M) = {x G R ; x'Mx = 1}. 

Note that, for N = R, then 5*2 (R) = 5'R In this particular Euclidean setting, 
the duality diagram represents the following transition fomulas 

Aui = ai and vi = MaR A/a'^Mai) 

and 

A'vi= bi and ui = Nbi/A/b(Nbi). 

The solution of the last two equations can be reexpressed as a general¬ 
ized eigenvalue-eigenvector problem in the following way: From the last two 
equations we get 

ai = Aui 

= ANbi/RRNbi) 

M-RiRa;Mai) = ANA'vi/RRNbi), 
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from which one gets 


MANA'vi = AiVi; 


and similarly 


NA'MAui = AiUi. 


In the last two equations the eigenequations are functions of principal axes 
Ui and vi- However one can reexpress them as functions of projected factor 
scores 

ANA'Mai = Aiai; 

and similarly 

NA'MANbi = Aibi. 


5 Conclusion 

We embedded the ordinary SVD into a larger family based on induced matrix 
norms, and provided the transition formulas and a simple criss-cross iterative 
procedure to compute the principal axes and principal factor scores. Given 
that there are infinite number of SVD like decompositions, depending on the 
underlying induced norms, one is tempted to ask which is the best? 

It is quite ironic that the centroid decomposition, the oldest method, 
was recently rediscovered and restudied as a robust method, see Choulakian 
(2005), Kwak (2008), McCoy and Tropp (2011), after being dumped almost 
sixty years ago for the following reason given in Hubert et ah (2000, p.76) 
’’Comments in Guttman (1944) and elsewhere (e.g., Horst (1965) and Har¬ 
man (1967)) with regard to this centroid strategy generally considered it 
a poor approximation to what could be generated from Hotelling’s method 
that would choose successive unit length vectors to produce a rank reduction 
by identifying (through an iterative strategy) the eigenvector associated with 
the largest eigenvalue for each of the residual matrices successively obtained. 
At the time, however, the centroid method was computationally much less 
demanding than Hotelling’s iterative (or power) method for obtaining each 
of the principal components (again, one-at-a-time and reducing rank at each 
iteration); for this reason alone, the centroid method was a very common 
factorization strategy until electronic computing capabilities became more 
widely available”. This comment shows that the centroid method was a vic¬ 
tim of the habit of using mathematical methods in statistics based on optimal 
criteria, as if optimality is a guarantee of efficiency. 
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The arguments advanced by Benzecri on the advantages of the Euclidean 
geometry over the taxicab geometry for multidimensional data analysis are 
both computational and metaphysical. On the use of LI distance in data 
analysis, Benzecri (1977, page 13) commented in the following way ” elle 
(LI) ne permet pas d’utiliser la geometrie euclidienne multidimensionnelle; 
elle donnera des resultats qui qualitativement ressembleront a ceux obtenus 
par la distance...quadratique; mais au prix de calculs plus compliques et 
sous une forme moins commode. Sans permettre a I’outil mathematique de 
dehgurer le reel, on doit lui conceder que la transmission a 1’esprit humain 
d’un vaste ensemble de donnees synthetise (resume; rendu perceptible par le 
calcul) ait ses lois propres. (On se souvient que le primat de la geometrie 
euclidienne est admis par Torgerson)”. The ease of computation argument 
is very similar to Gauss’s argument in the adoption of least squares criterion 
in the linear regression model. While the metaphysical argument, if my 
understanding is correct, is that: the transmission to the human spirit of a 
synthesis of a collection of data has its proper laws, which are based on the 
Euclidean geometry. 

Ten Berge (2005, personal communication) thought that the centroid 
method produced good results, but it was not mathematically well under¬ 
stood during the last century. Benzecri (1973b, page 1 in Avant-propos) 
considered data analysis an experimental science, and that he has a predelic- 
tion for the case studies in his books. A similar thought is also found in 
Tukey: ’’The test of a good procedure is how well it works, not how well it 
is understood”. 

As to the question asked which decomposition is the best? Mathemati¬ 
cally, the SVD is the best and the reference, but quite sensitive to outlying 
observations: So, we suggest the joint use of SVD and the taxicab decompo¬ 
sition, or, the SVD and the centroid decomposition. 
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